本文介绍了在初始化和目标函数的神经网络之间``初始对齐'(inal''(inal)的概念。可以证明,如果网络和布尔目标函数没有明显的信息,则在具有归一化I.I.D的完全连接的网络上嘈杂的梯度下降。初始化不会在多项式时间内学习。因此,在体系结构设计中需要有关目标(由INAL测量)的一定程度的知识。这也为[AS20]中提出的开放问题提供了答案。结果基于在对称神经网络上的下降算法的较低限制,而没有明确了解目标函数以外的目标函数。
translated by 谷歌翻译
在应用每个随机初始化的神经网络层后,数据集的几何表示如何改变?庆祝的约翰逊 - Lindenstrauss Lemma回答了线性完全连接的神经网络(FNNS)的这个问题,说明几何形状基本上是保存的。对于Relu激活的FNN,根据已知的映射,两个输入合同之间的角度。非线性卷积神经网络(CNNS)的问题变得更复杂。要回答这个问题,我们介绍了几何框架。对于线性CNNS,我们表明约翰逊 - 林登斯特兰LEMMA继续保持,即两个输入之间的角度被保留。另一方面,对于带有Relu激活的CNNS,行为富裕:输出合同之间的角度,其中收缩级别取决于输入的性质。特别地,在一层之后,基本上保留了自然图像的几何形状,而对于高斯相关的输入,CNNS表现出与具有Relu激活的FNN相同的收缩行为。
translated by 谷歌翻译
我们研究由SGD的变体训练的Relu神经网络的隐式偏置,其中在每个步骤中,标签以概率$ P $更改为随机标签(标记平滑是该过程的关闭变体)。我们的实验表明,标签噪声在以下意义上推动网络到稀疏解决方案:对于典型的输入,一小部分神经元是有效的,并且隐藏层的烧制图案是稀疏的。实际上,对于某些情况,适当的标签噪声不仅缩小网络,而且还减少了测试错误。然后,我们转向这些稀疏机制的理论分析,重点关注$ p = 1 $的极值案例。我们展示在这种情况下,网络沿着实验预期,但令人惊讶的是,以不同的方式依赖于学习率和偏见的存在,有重量消失或释放的神经元。
translated by 谷歌翻译
Modeling lies at the core of both the financial and the insurance industry for a wide variety of tasks. The rise and development of machine learning and deep learning models have created many opportunities to improve our modeling toolbox. Breakthroughs in these fields often come with the requirement of large amounts of data. Such large datasets are often not publicly available in finance and insurance, mainly due to privacy and ethics concerns. This lack of data is currently one of the main hurdles in developing better models. One possible option to alleviating this issue is generative modeling. Generative models are capable of simulating fake but realistic-looking data, also referred to as synthetic data, that can be shared more freely. Generative Adversarial Networks (GANs) is such a model that increases our capacity to fit very high-dimensional distributions of data. While research on GANs is an active topic in fields like computer vision, they have found limited adoption within the human sciences, like economics and insurance. Reason for this is that in these fields, most questions are inherently about identification of causal effects, while to this day neural networks, which are at the center of the GAN framework, focus mostly on high-dimensional correlations. In this paper we study the causal preservation capabilities of GANs and whether the produced synthetic data can reliably be used to answer causal questions. This is done by performing causal analyses on the synthetic data, produced by a GAN, with increasingly more lenient assumptions. We consider the cross-sectional case, the time series case and the case with a complete structural model. It is shown that in the simple cross-sectional scenario where correlation equals causation the GAN preserves causality, but that challenges arise for more advanced analyses.
translated by 谷歌翻译
Deep learning models are known to put the privacy of their training data at risk, which poses challenges for their safe and ethical release to the public. Differentially private stochastic gradient descent is the de facto standard for training neural networks without leaking sensitive information about the training data. However, applying it to models for graph-structured data poses a novel challenge: unlike with i.i.d. data, sensitive information about a node in a graph cannot only leak through its gradients, but also through the gradients of all nodes within a larger neighborhood. In practice, this limits privacy-preserving deep learning on graphs to very shallow graph neural networks. We propose to solve this issue by training graph neural networks on disjoint subgraphs of a given training graph. We develop three random-walk-based methods for generating such disjoint subgraphs and perform a careful analysis of the data-generating distributions to provide strong privacy guarantees. Through extensive experiments, we show that our method greatly outperforms the state-of-the-art baseline on three large graphs, and matches or outperforms it on four smaller ones.
translated by 谷歌翻译
Data-driven models such as neural networks are being applied more and more to safety-critical applications, such as the modeling and control of cyber-physical systems. Despite the flexibility of the approach, there are still concerns about the safety of these models in this context, as well as the need for large amounts of potentially expensive data. In particular, when long-term predictions are needed or frequent measurements are not available, the open-loop stability of the model becomes important. However, it is difficult to make such guarantees for complex black-box models such as neural networks, and prior work has shown that model stability is indeed an issue. In this work, we consider an aluminum extraction process where measurements of the internal state of the reactor are time-consuming and expensive. We model the process using neural networks and investigate the role of including skip connections in the network architecture as well as using l1 regularization to induce sparse connection weights. We demonstrate that these measures can greatly improve both the accuracy and the stability of the models for datasets of varying sizes.
translated by 谷歌翻译
Machine learning models are typically evaluated by computing similarity with reference annotations and trained by maximizing similarity with such. Especially in the bio-medical domain, annotations are subjective and suffer from low inter- and intra-rater reliability. Since annotations only reflect the annotation entity's interpretation of the real world, this can lead to sub-optimal predictions even though the model achieves high similarity scores. Here, the theoretical concept of Peak Ground Truth (PGT) is introduced. PGT marks the point beyond which an increase in similarity with the reference annotation stops translating to better Real World Model Performance (RWMP). Additionally, a quantitative technique to approximate PGT by computing inter- and intra-rater reliability is proposed. Finally, three categories of PGT-aware strategies to evaluate and improve model performance are reviewed.
translated by 谷歌翻译
Explainable AI transforms opaque decision strategies of ML models into explanations that are interpretable by the user, for example, identifying the contribution of each input feature to the prediction at hand. Such explanations, however, entangle the potentially multiple factors that enter into the overall complex decision strategy. We propose to disentangle explanations by finding relevant subspaces in activation space that can be mapped to more abstract human-understandable concepts and enable a joint attribution on concepts and input features. To automatically extract the desired representation, we propose new subspace analysis formulations that extend the principle of PCA and subspace analysis to explanations. These novel analyses, which we call principal relevant component analysis (PRCA) and disentangled relevant subspace analysis (DRSA), optimize relevance of projected activations rather than the more traditional variance or kurtosis. This enables a much stronger focus on subspaces that are truly relevant for the prediction and the explanation, in particular, ignoring activations or concepts to which the prediction model is invariant. Our approach is general enough to work alongside common attribution techniques such as Shapley Value, Integrated Gradients, or LRP. Our proposed methods show to be practically useful and compare favorably to the state of the art as demonstrated on benchmarks and three use cases.
translated by 谷歌翻译
Cybercriminals are moving towards zero-day attacks affecting resource-constrained devices such as single-board computers (SBC). Assuming that perfect security is unrealistic, Moving Target Defense (MTD) is a promising approach to mitigate attacks by dynamically altering target attack surfaces. Still, selecting suitable MTD techniques for zero-day attacks is an open challenge. Reinforcement Learning (RL) could be an effective approach to optimize the MTD selection through trial and error, but the literature fails when i) evaluating the performance of RL and MTD solutions in real-world scenarios, ii) studying whether behavioral fingerprinting is suitable for representing SBC's states, and iii) calculating the consumption of resources in SBC. To improve these limitations, the work at hand proposes an online RL-based framework to learn the correct MTD mechanisms mitigating heterogeneous zero-day attacks in SBC. The framework considers behavioral fingerprinting to represent SBCs' states and RL to learn MTD techniques that mitigate each malicious state. It has been deployed on a real IoT crowdsensing scenario with a Raspberry Pi acting as a spectrum sensor. More in detail, the Raspberry Pi has been infected with different samples of command and control malware, rootkits, and ransomware to later select between four existing MTD techniques. A set of experiments demonstrated the suitability of the framework to learn proper MTD techniques mitigating all attacks (except a harmfulness rootkit) while consuming <1 MB of storage and utilizing <55% CPU and <80% RAM.
translated by 谷歌翻译
Vision-based tactile sensors have gained extensive attention in the robotics community. The sensors are highly expected to be capable of extracting contact information i.e. haptic information during in-hand manipulation. This nature of tactile sensors makes them a perfect match for haptic feedback applications. In this paper, we propose a contact force estimation method using the vision-based tactile sensor DIGIT, and apply it to a position-force teleoperation architecture for force feedback. The force estimation is done by building a depth map for DIGIT gel surface deformation measurement and applying a regression algorithm on estimated depth data and ground truth force data to get the depth-force relationship. The experiment is performed by constructing a grasping force feedback system with a haptic device as a leader robot and a parallel robot gripper as a follower robot, where the DIGIT sensor is attached to the tip of the robot gripper to estimate the contact force. The preliminary results show the capability of using the low-cost vision-based sensor for force feedback applications.
translated by 谷歌翻译